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ISIS and Meta Projects: Progress Report 


Kenneth Birman, Robert Cooper, and Keith Marzullo* 
February 22, 1990 


Isis and Meta are two distributed systems projects at Cornell Univer- 
sity. The Is fs project, led by Ken Birman, has developed a new methodology, 
virtual synchony, for writing robust distributed software. This approach 
is directly supported by the Isis Toolkit, a programming system that is 
distributed to over 300 academic and industrial sites. As the basic Is IS 
techniques have matured, we have focused increasingly on some of the re- 
maining “hard problems” of reliable distributed programming. Principally 
these include high performance multicast, large scale applications, and wide 
area networks. We are also developing several interesting applications that 
exploit the strengths of Isis, including an NFS-compatible replicated file 
system. 

The Meta project, led by Keith Marzullo, is about distributed control 
in a soft real-time environment incorporating feedback. This domain en- 
compasses examples as diverse as monitoring inventory and consumption on 
a factory floor, and performing load- balancing on a distributed computing 
system. One of the first uses of Meta is for distributed application manage- 
ment: the tasks of configuring a distributed program, dynamically adapting 
to failures, and monitoring its performance. 

This article reports our recent progress and current plans. But first we 
begin by explaining our approach to distributed computing, a philosophy 
that we believe significantly distinguishes our work from that of others in 
the field. 

'This inateriaJ is adapted from a short paper presented at the Workshop on Mission 
Critical Operating Systems, Washington, Nov, 1989. This work was supported by the 
Defense Advanced Research Projects Agency (DoD) under ARPA order 6037, Contract 
N00140-87-C-8904 and under DARPA/NASA subcontract NAG2-593 administered by 
the NASA Ames Research Center. The views, opinions, and findings contained in this 
report are those of the authors and should not be construed as an official Department of 
Defense position, policy, or decision. 
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Network transparency: Too much of a good thing? 

Users of contemporary distributed computing systems rapidly discover how 
similar such systems are to the timeshared machines of the 1970’s: the 
pervasive use of “network transparency” techniques lets us largely ignore 
the fact of distribution. Normally, this is a desirable property. For example, 
the dominant distributed programming technology, remote procedure calls 
(RPC), permits a program running on one machine to invoke a procedure 
residing in some other program. Given adequate language support, an RPC 
interface can hide many details of message-based interaction and connection 
management from the user. The idea of transparency also extends to other 
parts of a typical distributed system. Using a file system like NFS, a program 
can operate on files that physically reside on a remote machine in the network 
using the same interface as for local files. 

Complete transparency is troubling, however when one considers the 
many reasons that distributed computing should be different from non- 
distributed programming. Parallel computing is in many ways analogous to 
distributed computing. Yet, whereas the effective use of parallel machines 
has triggered a search for fundamentally new programming languages and 
methodologies, this has not happened for distributed programs. If we are 
building distributed systems using technologies that proved unsatisfactory 
in parallel settings, is it not likely that our distributed systems are making 
ineffective use of parallelism? 

The requirements placed on a distributed application often go beyond 
the exploitation of concurrency. In particular, one often wishes to monitor 
and control a distributed computer system while it is running. Moreover, 
a distributed system may need to remain operational in the presense of 
partial failures. By this we mean situations where one of the machines 
connected to a network fails or becomes partitioned from the others, while 
the majority of the machines remain operational and must reconfigure and 
continue executing. The complementary problem also arises, of reintegrating 
a recovered machine into an online system. 

The Isis project is based on the premise that when we pretend that a 
distributed system is really a timeshared system, or encourage the user to 
program as if his or her application were the only process running on the 
system, as with transactional RPC, we discard a powerful resource: the 
fact of distribution itself. We lose the ability to employ a set of processes 
in a coordinated, cooperative attack on a problem. We lose the ability to 
apply highly adaptive, reconfigurable solutions to applications that must 
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remain online in the presense of failures and recoveries. And, we make it 
difficult to build a distributed system that is more fault- tolerant and offers 
higher performance than any of its components. The Isis Toolkit, and the 
Meta system that we are now building on top of the Toolkit environment, 
represent a significant step towards addressing these sorts of issues. 

The ISIS Toolkit: Process groups and multicast 

At the lowest level, the Isis system provides a toolkit of distributed pro- 
gramming techniques. This consists of a layer of software to assist the 
programmer in building distributed applications. The toolkit is very much 
like an extension of the operating system, although implemented without 
changes to the operating systems on which Isis runs. 

Central to Isis is the notion of a process group. These groups are a 
lightweight programming construct: a single process can belong to arbitrar- 
ily many groups, and there is minimal overhead in being a member of a 
group. A process can dynamically join and leave groups, and groups can 
span multiple piachines. Groups have a hierarchical namespace, much like a 
file system namespace, and permit flexible, location- transparent addressing. 

Isis provides multicast and unicast (point-to-point) communication prim- 
itives that are easy-to-use and flexible to the demands of the programmer. 
A multicast can be directed to all members of a group, and zero or more 
will respond, depending on the needs of the particular application. 

Concurrent multicasts and unicasts, dynamic group changes and failures 
would seem to present a very complex, even daunting, execution environ- 
ment. But in Isis all these concurrent events appear to happen one-at-a- 
time. We call this simplifying model virtual synchrony . 

Virtual synchrony 

Virtual synchrony is a general approach to solving distributed computing 
problems. Derived in part from the state machine approach (introduced 
by Lamport and Schneider), virtual synchrony permits the programmer to 
design a distributed program for execution in a simplified environment, in 
which all processes appear to observe events simultaneously and therefore in 
the same order. Events such as multicast and detection of failures are atomic 
in a virtually synchronous setting: all group members receive a message or 
observe a failure if any does, and in the same consistent ordering. The 
synchronous abstraction is relaxed when the program is executed by Isis 
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using application-specific knowledge. Isis has several multicast primitives 
that differ in the kind of ordering they enforce on concurrent events. By 
selecting the appropriate primitive, the programmer tells Isis what degree 
of synchrony is needed for that part of their application. 

Virtual synchrony permits the Isis programmer to work in an environ- 
ment where many of the aspects that render distributed computing diffi- 
cult do not arise, but the resulting program runs as asynchronously (and 
fault- tolerantly) as may be desired, without compromising correctness. Vir- 
tual synchrony has been exploited throughout Isis, and leads to a simple 
step-by-step programming style that even relatively unskilled programmers 
can follow. Taken together with the wide range of tools represented in the 
toolkit, the approach leads to a major jump in programmer productivity, 
and major improvements in the robustness of distributed software. 

Virtual synchrony has a well- developed theory, principally through the 
work of Ph.D. graduate Frank Schmuck, that explains when low-cost asyn- 
chronous techniques can be used to implement virtual synchrony. More 
recently, we have explored the relationship between global correctness and 
consistency properties in distributed systems and the ordering mechanisms 
needed to achieve them; a technical report on this subject is listed below. 

ISIS Toolkit: Problem-specific tools 

Using the process group, multicast and unicast primitives, Isis provides a 
variety of higher level tools that solve common subproblems in distributed 
computing. For example, tools arc provided to: 

• Manage replicated data in memory or on a disk file 

• Split a computation among several machines to exploit parallelism 

• Coordinate an external action such as operating independent welding 
units that are jointly welding an automobile body 

• Synchronize concurrent actions such as when several processes share 
a resource that only one can use at a time 

• Monitor the status of a computation, process or computer, triggering 
user-programmed actions should it fail 

• Dynamically reconfigure to adapt after a failure or to integrate a re- 
covered machine into an operational system, restarting services that 
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should run at that location and bringing them up-to-date concerning 
the active state of the system. 

This is just a partial list. Moreover, the tools are integrated with each other 
in a way that makes it easy to obtain consistent behavior even when several 
processes must react independently to the same event. 

Isis Version 1.3.1 

The present version of the Isis toolkit can be used from C, C++, and For- 
tran. Common Lisp interfaces are available for several versions of this lan- 
guage. Isis runs on (and between) Sun, DEC, HP, Gould, NeXT and Apollo 
equipment, on and between several versions of Unix (including Mach, Aix, 
HP-UX and Unicos). Ports to DEC’s VMS system and IBM’s VM operat- 
ing system are being considered, as is an interface to PCs running OS/2. 

The Isis Toolkit is in increasingly wide use, and our group has distributed 
more than 300 copies of the source for Isis VI. 3.1. Among the users of the 
current system are a number of Fortune 500 companies, several industrial 
research and prototyping groups, and a number of academic researchers and 
instructors. Applications include controlling a world-wide nuclear testban 
and seismic monitoring system, automating a factory-floor VLSI fabrication 
system, dissemination of quotes and other real-time data in brokerage set- 
tings, and CAE/CAM systems. This diverse user base has been a source of 
invaluable feedback. 


Isis Version 2.0 

Although Isis VI. 3.1 has proved extremely robust, it is also sluggish and 
hard to scale. Isis V2.0 will soon be released, and overcomes these limita- 
tions while preserving the robustness of VI. 3.1. With regard to performance, 
V2.0 includes a new “bypass” communication protocol suite, which permits 
group communication at hardware speeds and enables the application de- 
signer to introduce new multicast transport algorithms that exploit special 
hardware or software features, or offer special properties such as real-time 
delivery guarantees. This facility represents a major advance for our group, 
and yields multicasts that are a match for alternative approaches that lack 
Isis’s atomicity and ordering guarantees. We feel that it overcomes the 
widespread concern that fault- tolerance may simply be too costly a price 
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to pay in “real” distributed systems. On the contrary, we now feel that 
developers who build on a conventional software substrate are limiting their 
options, working with unnecessarily complex message- at- a- time interfaces, 
and not even gaining a performance advantage by doing so. 

With regard to scale, Isis V2.0 has two significant extensions that re- 
spond to the most urgent needs identified by our users. One permits us to 
connect applications on computers that don’t run Isis to the Isis system 
as remote clients. The interface is largely transparent to the application 
designer and imposes little overhead. In the initial implementation of re- 
mote clients, the remote Isis server may introduce a common failure point 
for those computers that are its remote clients. We plan to increase the 
fault- tolerance of this mechanism by permitting a remote client to switch 
dynamically between Isis servers in case of failure. Nevertheless, the current 
implementation is a good match for diskless workstations where a client’s 
remote disk server machine will also act as its Isis server process. 

A second extension permits users to develop services that span wide- 
area networks, residing on multiple Isis local networks and communicating 
infrequently and asynchronously. For example, the large-scale seismology 
system cited earlier uses this facility to keep track of the location of files 
containing signal analysis output and to transfer these files from one Isis 
system to another. The long-distance circuits are set up periodically, used 
intensively, and then closed down to minimize communications costs. 

ISIS applications 

In order to exercise and evaluate Isis, we have developed several fault- 
tolerant applications. For example, we have built a distributed, fault- tolerant 
version of the Unix program make, a main-memory distributed relational 
database, and a multi-user spreadsheet that can be used in a cooperative 
manner. The first two of these applications are available as part of the Isis 
V2.0 release, and the spreadsheet should be available by the end of 1990. 

As part of his research, graduate student Alex Siegel has been designing 
and building a highly-available file system called Deceit This file system is 
completely compatible with NFS, yet uses replication for improved response 
time, higher availability and better scaling. Additionally, Deceit allows the 
clients to specify properties of individual files in order to tune access to the 
file. Currently, Deceit is running in a prototype form. It outperforms NFS 
for many operations (notably read and write), and equals NFS for almost 
all others operations. 
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Using Isis for large scale applications 

Many systems that support process groups assume that any single appli- 
cation will use at most one group. Most Isis applications employ several 
groups, and many use large hierarchically structured groups. This is ex- 
plained by two factors. First, the trend toward modularity and object- 
oriented programming in distributed systems leads many designers to think 
of a process group as a form of distributed object. Even if the components 
of the group are coded in different languages or have differing functionality, 
this proves to be a simplifying and powerful structuring methodology. Since 
a single process may make use of several services, each implemented using 
such a process group, it is not uncommon for a single process to belong to 
many groups. 

A second factor is concerned with scale: Is IS users are building sur- 
prisingly large distributed applications, with groups which contain many 
processes. It is unusual, and unwise, to multicast to the entire membership 
of such a large group, except where widespread dissemination of information 
is fundamental to the application itself. (This might be the case for a dis- 
tributed network news application for example.) Designers of large systems 
are thus lead to use a hierarchical structure in which a large group contains 
a number of smaller groups. These smaller groups are chosen in such a way 
that most multicasts are destined to just one or two such groups. 

Responding to these needs, Robert Cooper has designed a suite of hier- 
archical process group tools for Isis. These extend the basic tools to oper- 
ate transparently on hierarchical groups, while augmenting the system with 
mechanisms for reliably broadcasting to a large group that is maintained 
hierarchically. A prototype of this facility is nearing completion. 


High performance multicast 

Looking to the future, we are exploring a number of theoretical and prac- 
tical topics at the Toolkit level. The practical ones include adding a better 
security mechanism to the system, extending Isis to support real-time pro- 
tocols and other special-purpose protocols, and integrating the system into 
environments with parallel processors and extremely high speed communica- 
tions protocols. Mechanisms for exploiting new operating systems, such as 
Chorus and Mach, also represent an appealing direction. Graduate student 
Patrick Stephenson has developed a class of extremely high performance 
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multicast transport protocols suitable for use in the new Isis system, and 
we plan to combine these in conjunction with an Isis service knowledge- 
able about the communication topology of a local area network to develop 
a suite of protocols that adapt themselves to the environment, for instance 
exploiting Ethernet multicast when possible. We also hope to scale the size 
of local area network on which Isis may run from the current limit of about 
64 nodes up to hundreds or thousands of nodes, by introducing hierarchy at 
the lowest levels of the Toolkit, The Toolkit architecture now seems fairly 
stable, and is unlikely to change in visible ways as these extensions are made. 

The META system 

We mentioned above that Isis involves software at several levels. The toolkit 
is a low-level technology, for use by programmers who actually code dis- 
tributed programs. The Meta system is a collection of higher level tools 
that aid in gluing together distributed programs into a reliable and adaptive 
distributed system. 

At the core of Meta is a set of routines that support building reliable 
reactive systems , such as factory floor management systems, process con- 
trol systems, and the control aspects of distributed applications. This level 
provides a platform that can be used to monitor and control a distributed 
system. Supported at this level are routines for instrumenting a distributed 
system, monitoring for (perhaps complex) real-time conditions, and trigger- 
ing actions on the controlled system. 

There are two interfaces to the sensor/actuator platform. The low-level 
interface permits users to define raw sensors and actuators, namely routines 
(or variables) in user programs that can be queried to obtain the current sen- 
sor value. At this level, META also supports an entity-relationship database 
model describing sensors, their real-time properties, and the relationships 
between them. Some raw sensors are predefined, such as the ones giving the 
load on a computer or a process, while others can be defined dynamically, 
such as the length of a job queue maintained by some software component 
of a larger system. Also supported are mechanisms for composing multiple 
raw sensors into an abstract sensor. This is used to define such properties 
as the average over a set of sensors, as well as to support sensors tolerant of 
certain classes of failures. 

The high-level interface to META is concerned with querying and mon- 
itoring sensors. This supports a Prolog-like query language for identifying 
individual sensors and sets of sensors satisfying user-defined predicates, as 
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well as a trigger language whereby the user can monitor for events of inter- 
est, triggering appropriate actions when the event is detected. Both of these 
interfaces are provided at the language level. 

Built on the basic platform are a number of facilities for actually manag- 
ing distributed applications. These help manage the allocation of system re- 
sources, control the initiation, migration, and termination of programs, and 
monitor the performance of the system. One interface to this distributed 
system manager is accessed through a powerful graphical interface: using 
this facility, one can achieve sophisticated fault- tolerant behavior without 
writing a line of code. 

Parts of Meta are currently available, while other parts are still being 
built. The Meta platform of sensors and actuators facilities, built by grad- 
uate student Mark Wood, is provided in Isis V2.0, and the design of our 
sensor query language is complete; an implementation is expected to be fin- 
ished during 1990. Visitor Itobbert Van Renesse has developed, on top of 
Meta, a distributed application management program called Garp, which 
is a prototype graphical monitoring and control program. This system will 
also be released sometime in 1990. 

Support for the software 

Although Isis is an academic project, it has acquired an increasingly large 
commercial following. At present, all of the academically developed Isis 
software is freely available in the public domain. We have made a major 
effort to provide high quality support for this software, and believe we have 
an excellent record of responsiveness— and of success in tracking down and 
fixing bugs. On the other hand, this sort of commercial responsiveness is 
making it increasingly difficult to maintain an active research program. 

To address this problem, we have formed a company, Isis Distributed 
Systems Incorporated, which is offering commercial services to companies 
in need of customized software or consulting. Starting in 1990, these will 
include support for the Isis Toolkit and products that extend the Toolkit 
to respond to some of the specialized demands of our user group. For ex- 
ample, IDS is now building a collection of general purpose software tools for 
one client whose application demands certain specialized components that 
Meta currently lacks. In this particular case, the resulting software will 
eventually enter the Isis public distributions. However, IDS is also engaged 
in proprietary software development, and is intended to operate as an in- 
creasingly autonomous commercial operation, freeing our research group to 
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focus on research. 


Obtaining ISIS 

To obtain information about Isis, or a copy of the current software distri- 
bution, write to: The Isis Project, Department of Computer Science, 4105 
Upson Hall, Cornell University, NY 14853 (607-255-9198), or send elec- 
tronic mail to isis@cs.cornell.edu. The group also maintains a mailing list 
to which announcements of all new papers are sent. 
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